MOTION PLANNING VIA OPTIMAL CONTROL FOR STOCHASTIC 

PROCESSES 



PEYMAN MOHAJERIN ESFAHANI, DEBASISH CHATTERJEE, AND JOHN LYGEROS 



Abstract. We study stochastic motion planning problems which involve a controlled pro- 
cess, with possibly discontinuous sample paths, visiting certain subsets of the state-space 
while avoiding others in a sequential fashion. For this purpose, we first introduce two basic 
notions of motion planning, and then establish a connection to a class of stochastic optimal 
control problems concerned with sequential stopping-times. A weak dynamic programming 
principle (DPP) is then proposed, which characterizes the set of initial states that admit the 
existence of a policy enabling the process to execute the desired maneuver with probability 
at least as much as some pre-specified value. The proposed DPP consists of some auxiliary 
value functions defined in terms of discontinuous payoff functions. An application of the DPP 
is demonstrated in the context of controlled diffusion processes thereafter. It turns out that 
the aforementioned set of initial states can be characterized as the level set of a discontinuous 
viscosity solution to a sequence of partial differential equations, for which the first one has a 
known boundary condition, while the boundary conditions of the subsequent ones are deter- 
mined by the solutions to the preceding steps. Finally, the generality and flexibility of the 
theoretical results are illustrated with the aid of an example involving biological switches. 



1. Introduction 

Motion planning can be viewed as a scheme of excursions to visiting certain specific sets in 
a specific order according to a specified time schedule. In the context of motion planning for 
controlled dynamical systems, the central issue is to determine whether there exits an admissible 
policy to drive the process through some sets while visiting certain targets in a pre-assigned order 
and scheduled times. In the deterministic setting, motion planning problems have been studied 
extensively from different perspectives; here we cite two representative articles [Sus9I, CS98] and 
refer to the references therein for further details of the literature. In this article we focus on the 
stochastic counterpart of the motion planning. The basic motion planning problem involving two 
targets and obstacle sets has been investigated in different contexts, e.g., from computational 
standpoint in finite probability spaces for a class of continuous-time Markov decision processes 
(CTMDPs) [BHKH05], or in discrete-time stochastic hybrid systems based on a dynamic pro- 
gramming approach in [CCL11, SL10]. In our earlier works [MECLlf] and [MECL12] we focused 
on reachability of controlled diffusion processes; the reachable sets in these works were character- 
ized as superlevel sets of value functions which are given by the discontinuous viscosity solutions 
to certain partial differential equations with some associated boundary conditions. Here we con- 
tinue our study by moving beyond reachability to more complex motion planning specifications 
for a larger class of stochastic processes with possibly discontinuous sample paths. 

We first introduce different motion planning scenarios in the context of piecewise continuous 
processes. We address the following natural question: for which set of initial states does there 

Date: November 7, 2012. 

Research supported by the European Commission under the project MoVeS (Grant Number 257005) and 
the HYCON2 Network of Excellence (FP7-ICT-2009-5). The authors are grateful to Andreas Milias-Argeitis for 
helpful discussion on modeling of biological networks and pointers to references. 



2 



P. MOHAJERIN ESFAHANI, D. CHATTERJEE, AND J. LYGEROS 



exist an admissible policy such that the controlled stochastic processes satisfy the motion plan- 
ning specifications with a probability greater than a given value pi To characterize this set of 
initial states, we establish a connection between the motion planning specifications to a class of 
stochastic optimal control problems involved sequential stopping times. We shall be concerned 
with a stochastic process started and stopped when it hits certain subsets of the state-space, 
or equivalently, the process obtained by concatenating segments of the original process between 
consecutive stopping times. 

Under certain mild assumptions on the admissible polices, the stochastic process, and the sets 
concerned the motion planning, we propose a Dynamic Programming Principle (DPP) in which 
some auxiliary value functions are required. The DPP is introduced in a weak version in the spirit 
of [BT11]; this formulation does not require measurability of the value functions. In the following 
sections we shall focus on a class of diffusion processes as the strong solution of a stochastic 
differential equation (SDE), in which the required assumptions of the DPP are investigated. In 
light of the proposed DPP, we develop a new framework to characterize the desired initial sets 
based on tools from partial differential equations (PDE's). Due to the discontinuities in the value 
functions corresponding to these problems, all the PDE's are understood in the generalized notion 
of the so called viscosity solutions. It turns out to solve the value functions a series of PDEs 
are considered, in which the preceding PDE provides the boundary condition of the proceeding 
PDE, i.e., the PDEs are solved in a recursive fashion. In order to numerically compute the 
desired initial sets by means of off-the-shelf PDE solvers, some numerical issues are discussed 
thereafter. 

Mention may be made of the fact that the techniques proposed here suffer from the curse of 
dimensionality. Given a continuous-time controlled stochastic process, e.g., a controlled diffusion, 
one approach to solving motion planning problems may proceed with a discretization of the state- 
space, constructing a controlled Markov chain on the discretized space that approximates the 
original process in a certain way, and then to deal with this Markov chain insofar as motion 
planning is concerned. While it may appear that this discretized setting is conceptually simpler, 
it does not lead to dimensionality reduction, and moreover, introduces the non-trivial issue of 
the quality of the approximation and the associated errors due to the discretization involved. 
Furthermore, to our knowledge, there appears to be no off-the-shelf software that algorithmically 
leads to the aforementioned discretization. In contrast, our techniques deal directly with the 
given controlled stochastic process. The implementation of our techniques require employment 
of off-the-shelf PDE solvers, which is a well-studied topic. The errors in our numerical solutions 
occur only due to the employment of PDE solvers. 

The article is organized as follows: in §2 we formally introduce the stochastic motion planning 
problems on the prescribed probability space. In §3 we construct a connection between the 
motion planning problems to a class of stochastic optimal control problems, for which a weak 
DPP is proposed in §4 in terms of some auxiliary value functions. An application of the proposed 
DPP is illustrated in §5, which leads to an alternative characterization of the motion planning 
objective by a series of PDE's in a recursive fashion. To validate the performance of the proposed 
methodology, in §6 the theoretical results are applied to a biological two-gene network, where 
the quality of the biological switch is investigated. For better readability, some of the technical 
proofs of §3 and §5 are moved to Appendix A and B, respectively. 



Notation 

For the ease of readers, we provide here a partial notation list which will be also explained in 
more details later throughout the article: 
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• A (rcsp. V): minimum (resp. maximum) operator; 

• A c (resp. A°): complement (resp. interior) of the set A; 

• B r (x): open Euclidian ball centered at x and radius r; 

• 05(A): Borel u-algebra on a topological space A; 

• lit. set of admissible policies at time t; 

• (Xl> x ' u ) s >o: stochastic process under the control policy u and convention X t s ' x ' u := x 
for all s < t; 

• (Wi Gi)<Ti (resp. Wi — ^ Gi ) : motion-planning events of reaching Gi sometime 
before time Tj (resp. at time Ti) while staying in Wi, see Definition 2.1; 

• (0f fc: ")™_ fc : sequential exit-times from the sets (Ai)" =k in order, see Definition 3.1; 

• V* (resp. V*): upper (resp. lower) semicontinuous envelope of the function V; 

• C u : Dynkin operator, see Definition 5.5. 



Consider a filtered probability space (f2, J 7 , F, P) whose filtration F := (J r s ) s >o is generated by 
an ]R dz -valued process z := (z s ) s >o with independent increments. Let this natural filtration be 
enlarged by its right-continuous completion, i.e. it satisfies the usual conditions of completeness 
and right continuity [KS91, p. 48]. For future purposes, we introduce an auxiliary subfiltration 
Ft := {J-t,s)s>o, where Tt,s is the P-completion of <r(z r — z t ,t < r < tV s). Note that for s < t, 
Tt y s is the trivial a— algebra, and any 7-" t s -random variable is independent of T t . By definitions, 
it is obvious that Tt, s Q Fs with equality in case of t = 0. 

The object of our study is an K d -valued controlled random process (X*' a:; '") s>t , initialized at 
(t, x) under the control policy u E U t , where U t is the set of admissible policies at time t. Let 
T > be a fixed time horizon, and let S := [0,T] x M. d . Throughout this work we assume that 
for every (t,x) £ S and u £ U tl the process (Xl' x;u ) s>t is F-adapted process with RCLL sample 

paths. 1 We denote by T the collection of all F-stopping times; for T\,T2 £ T with T\ < T2 P-a.s. 
we let the subset 7[ Ti ,t 2 ] denote the collection of all F Tl -stopping times r such that n < r < T2 
P-a.s. Measurability on W 1 will always refer to Borel- measurability, and 58(A) stands for the 
Borel cr-algebra on a topological space A. Throughout this article all the (in)equalities between 
random variables are understood in almost sure sense. 

Given sets (Wi, Gi) £ 23(R d ) x 23(M d ) for i £ {1, • • • , n}, we are interested in a set of initial 
conditions (t, x) £ S such that there exists an admissible strategy u £ Ut steering the process 
X.' x ' u through the sets (Wi)" =1 while visiting (Gi)™ =1 in a pre-assigned order. In fact, Wi 
and Gi stand for "Way" and "Goal" respectively. One may pose this objective from different 
perspectives based on different time scheduling for the excursions between the sets. We formally 
introduce some of these notions which will be addressed throughout this article. 

Definition 2.1 (Motion-Planning Events). Consider a fixed initial condition (t,x) £ S and 
admissible policy u £ U t - Given a sequence of pairs (Wi,Gi)™ =1 C *B(]R d ) x *8(M <i ) and horizon 
times (Ti)f =1 C [t,T], we introduce the following motion-planning events: 



2. General Setting and Problem Description 




That is, processes with paths that are right continuous with left limits. 
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{x^f ;u € Gi and X f r ' x ' u e W h Vr e [Ti-i.Tj, Vi < n}, 
where in the above definitions Sq = Tq := t. 

The set in (la), roughly speaking, contains those events that the trajectory X.' x ' u , initialized 
at {t,x) € S and controlled via u € Ut, succeeds in visiting the sets (Gi)"=i m a certain order, 
while the entire duration between the two visits to G 2 _i and Gi is spent in Wi, all within the 
time horizon T. In other words, the journey from G;_i to the next destination Gi must belong 
to the way Wi for all i. Figure 1(a) depicts a sample path that successfully contributes to the 
first three phases of the excursion in the sense of (la). 




(a) A sample path satisfying the first three phases (b) A sample path satisfying the first three phases 
of the specification in the sense of (fa) of the specification in the sense of (lb) 

Figure 1. Sample paths of the process X.' x ' u for a fix policy u&Ut 

In the case of (lb), the set of paths is more restricted in comparison to (la). Indeed, not 
only is the trajectory confined to the ways Wi, but also there is a time schedule (Ti)™ =1 that 
a priori forces the process to be at the goal sets Gi at the specific times (Ti)™ =1 . Figure 1(b) 
demonstrates one sample path in which the first three phases of the excursion are successfully 
fulfilled. 

Note that once a trajectory belonging ot the set in (la) visits Gi for the first time, it is 
required to remain in the way Wj+i until the next goal G^+i is reached, whereas a trajectory 
belonging to the set in definition (lb) may visit the destination Gi several times, while staying 
in Wi, until the intermediate time schedule T^. The only requirement, in contrast to (la), is to 
confine the trajectory to be at the goal Gi at the time Tj. 

As an illustration, one can easily inspect that the successful sample path in Figure 1(b) indeed 
violates the requirements of the definition (la) as it leaves W2 after it visits the goal set Gi for 
the first time. In other words, the definition (la) changes the admissible way set Wi to W i+ i 
immediately after the trajectory visits the goal set Gi, while the definition (lb) only changes 
the admissible way set only after the intermediate time Ti irrespective of whether the trajectory 
visits the goal set Gi prior to Ti. 
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From the technical standpoint, if the target set Gi is not closed, then it is not difficult to see 
that there could be some continuous transitions through the boundary of the goal Gi that are 
not admissible in view of the definition (la) since the trajectory must reside in Wi \ Gi for the 
whole interval [sj_x,Sj[ and just hit the set Gi at the time Sj. Notice that we do not need to 
consider this issue for the set in definition (lb) since in this case the trajectory only visits the 
sets Gi at the specific times Tj while any continuous transition and maneuver inside the target 
sets Gi are allowed. In order to address the aforementioned issue, we may impose the following: 

Assumption 2.2. The sets (G 2 )™ =1 C <8(IR d ) are closed. 

Having discussed about the properties of motion planning as detailed in Definition 2.1, one 
may conclude that in general none of the definitions of (1) is more restrictive than the other. 
However, in the particular motion planning case where the family of the way sets (H^)™ =1 is 
nested, i.e., Wi C Wi+x, one can see that the motion planning definition of (lb) actually imposes 
more constraints on the trajectories of the random process in light of the right continuity of X t ' x,u 
and Assumption 2.2. The following Fact formally addresses this issue. 

Fact 2.3. Consider a family of set pairs (Wi,Gi)f =1 C <B(R d ) x Q3(M d ) where (G;)™ =1 satisfies 
Assumption 2.2 and (Wi)™ =1 is nested, i.e. Wi C Wi+i. Then, for all initial condition (t,x) £ §, 
intermediate times (7i)™ =1 C [i, T], and control policy u £ lit, it holds that 

^ x t, x ,u |= ( Wl A Gi) o.-.o (w n ^ eg} c {.v;-' : " h [(Wx ^Gx)o...o(w n ^ g„)]< t } 

Proof. Let ui be contained in the set defined in (lb). This means that Xj, x,u {lo) £ Gi, and 
for all r € [Tj_i,Tj] we have X*' x ' u (w) £ Wi. Due to the right continuity of the sample paths 
and Assumption 2.2, one can see that for all i € {1, • • • ,n} there exists Sj <E [Ti_i,Ti] so that 
Xtf' u ((j) £ Gi while Xp x ' u (u}) £Wi\Gi for all r £ [T^^s^, where T„ := t. Since the sets 
(Wi)™ =1 are nested, X*< x < u (uj) £ Wi-x C Wi for all r £ [si_i,Tj_i]. An induction argument 
quickly leads to 

Xtf u (oj) £ G h 

X t r ^ u {uj)£W l \G l , Vre [ Si -x,Si[, 

where sq := t. In fact one may introduce s$ as the first hitting time of the set Gi after time 7i_i. 2 
This implies that lu is also contained in the set defined in (la) and proves the assertion. □ 

Let us note that to satisfy the order of visiting the goal/target sets (Gi)™ =1 , it only suffices 
to exclude the target sets (Gi)™_ k+1 from the set Wk- Therefore, the hypothesis in Fact 2.3 
concerning the nested way sets Wi does not really impose any restriction on the motion planning 
objectives. 

Remark 2.4. The motion planning scenarios for only two sets (Wx, Gi) are essentially the basic 
reachability maneuver that was studied in our earlier work [MECL12]. The definition (la) 
suggests the same Reach- Avoid problem as in [MECL12, Definition 2.4], where the target and 
obstacle sets are G\ and R d \ Wx respectively. In this special case, the definition (lb) also follows 
the same concept as Reach-Avoid problem in [MECL12, Definition 3.4] with the same target and 
obstacle sets. 

Remark 2.5. A particular case of the Definition 2.1 is the following: G, := Aj_|_i\Aj and 
Gi := A i+ iDAi for the definitions (la) and (lb) respectively. Here the motion planning objective 
is to pass through n given sets (Ai)f =1 in a certain order. 



The formal definition of this stopping time and its application are considered later for Fact 3.3. 
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The events introduced in Definition 2.1 depend, of course, on the control policy u £U and 
initial condition (t, x) G S. The central objective of this work is to determine the set of initial 
conditions x G R d such that there exists an admissible policy u where the probability of the 
above path-planning events is higher than a certain threshold. To this end, we formally introduce 
these sets as follows: 

Definition 2.6 (Motion-Planning Initial Set). Consider a fixed initial time t G [0,T]. Given a 
sequence of set pairs (Wj,Gj)£_j C Q3(R d ) x Q3(IR d ) and horizon times (Tj)f =1 C [t,T], we define 
the following motion-planning initial sets: 

(2a) PP(t,p; (W h Gi)2 =1 ,T) := 

{x G R d \3ueU f : I '{A"" |= [(Wk - Gx) ° ■■■ o (W n - G„)]< T } > p}, 
(2b) PP(i,p; (Wi,G ! i)2 =1 ,(T j )JL 1 ) := 

{z G M d | 3m g W t : P{X*' x;u |= (Wi A Gi) o • • • o (W n A G„)} > p}- 

3. Connection to Stochastic Optimal Control 

In this section we establish a connection from stochastic motion-planning initial sets PP 
and PP, defined in Definition 2.6, and a class of stochastic optimal control problems involving 
stopping times. We introduce a sequence of random times that corresponds to the times that 
the process X. ' x,u for the first time exits from the sequence of sets one after another in a certain 
order: 

Definition 3.1. Given an initial condition (t, i)e§ and a sequence of measurable sets {Ai)™ =k C 
03 (R^), the sequence of random times (©i lfc: ")"_ fc defined 3 by 

Q? k "(t,x) :=inf{r >ef_*r(t,z) : X^^A,}, Q^(t,x):=t, 
is called the sequential exit-time through the set Ak to A n . 

Note that the sequential exit-time &f k: " depends on the control policy u in addition to the 
initial condition (t, x), but here and later in the sequel we shall suppress this dependence. For 
notational simplicity, we also drop (i, x) in the subsequent sections. 

In Figure 2 a sample path of the process X t,x ' u along with the sequential exit-times (©f* 3 )™^ 
is depicted for different k G {1, • • • ,3}. Note that since the initial condition x does not belongs 
to A3, the first exit-time of the set A3 is indeed the start time t, i.e., 3 3:3 = t. Let us highlight 
the difference between stopping times ©jf 1:3 and Ojf 2:3 . The former is the first exit-time of the 
set A2 after the time that the process leaves A\ , whereas the latter is the first exit-time of the 
set Ai from the very beginning. 4 

Given a stopping time 9 G 7[t,T] , let w G be a sample realization such that the process 
(X* ,a:;u (a;)) t<s<S r u \ successfully visits the first sets {Aj) l ~}y. For this cj, it then follows from 
Definition 3.1 that the sequential stopping times Q^ l n of visiting the sets (Aj)™ =i starting at 
the initial condition (t, x) is the same as the sequential exit-times 0^ i: ™ of visiting the sets 
(Aj)™ =i starting at (0(lj), Xgf^). Figure 3 depicts one of these sample paths where 6f 1:3 (a>) < 
0(lu) < 0^ 1:3 (w). It is obvious from the figure that starting from an initial condition (i, x), 

■^By convention, inf = 00. 

4 In §5 we shall see that these differences will lead to different definitions of value functions in order to derive 
a dynamic programming argument. 
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FIGURE 2. Sequential exit-times 
of a sample path through the sets 
(-Ai)? . for different values of k 



FIGURE 3. Sequential exit-times 
corresponding to different initial 
conditions 



the process views the exit-times of Ai and A3 after leaving A\ as if it starts from the initial 
condition (9,Xl' x ' u ) and proceeds to exit from the set Ai and A3 irrespective of the history 
before 9. Lemma 3.2 formally presents this result and will be employed later to prove one of the 
main result of this article, i.e., Theorem 4.5. 

Lemma 3.2. Consider a control policy u G Lit and initial condition (t, x) G §. Given a sequence 
of measurable sets (Aj)£_ fe C *B(K rf ) and stopping time 9 G T[t,T]i f or a ^ k G {1, • • • , n} and 
j > i > k we have 

Qf^(t,x) = Qf (O.X!^") on {e£j»(*,aO < 6 < Q^(t,x)} 

We need to establish that the sequential stopping-times are well-defined, hence: 

Fact 3.3 (Measurability) . Consider a sequence of measurable sets (Ai)l l =1 C *B(IR d ) and initial 
condition (t,x) G S. The sequential exit-time O i Un (t,x) is an W t -stopping time for all i G 
{1, - ■ ■ ,n}, i.e., {ef 1:n (t,x) < s} G F t . s for all s > 0. 

Proof. Let ta be the first exit-time from the set Ac 

(3) r Ai (t, x) := inf {s > : X*f j™ $ A,}. 

We know that ta is an F t -stopping time [EK86, Theorem 1.6, Chapter 2]. Let us(-) M- i? s (o;(-)J := 
lu(s + •) be the time-shift operator. From the definition it follows that for alii > 

Now the assertion follows directly in light of the measurability of the mapping $ and right 
continuity of the filtration F t . 5 □ 

For technical reasons we stipulate that the way sets satisfy the following: 



'See, for instance, [EK86, Proposition 1.4, Chapter 2] for more details in this regard. 
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Assumption 3.4. The sets (Wi)? =1 C 93 (R d ) are open. 

Under Assumption 3.4, the following fact is an immediate consequence of right continuity of 
the stochastic process X*' x,u : 

Fact 3.5. Fix a control policy u € lit and an initial condition (t,x) G §. Let measurable sets 
(Wi)™ =1 C 03(K d ) be given, and suppose that Assumption 3.4 holds for (Wi)™ =1 . Then, for all 
i£ {1,-- ,n} 

where i are sequential exit-times introduced as in Definition 3.1. 

Given (W t , G h T^f =l C 03(M rf ) x 03(M d ) x [t, T], we introduce two value functions V, V : § -> 
[0, 1] defined by 



(4a) V(t,x) := sup E 
ueu t 



(4b) V(t,x) := sup E 
ueu t 



% := 0f l! » A T, B 4 := Wi \ G 4 



where Oj^ 1 ", Qf Un are the sequential exit-times in the sense of Definition 3.1. Figure 4(a) and 
4(b) illustrate the sequential exit-times corresponding to the sets Bi and Wi, respectively; the 
sample trajectories are the same as Figure 1. The main result of this section, Proposition 3.6 
below, establishes a connection from the sets PP, PP and superlevel sets of the value functions 
V and V. 




W 2 \G 2 




(a) Sequential exit-times related to motion-planning event (b) Sequential exit-times related to motion- 
(la) planning event (lb) 



Figure 4. Sequential exit-times corresponding to different motion-planning 
events as introduced in (1) 
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Proposition 3.6. Fix a probability level p <E [0,1], a sequence of set pairs (Wi,Gj)f_ 1 c58(M d )x 
23(R ), an initial time t 6 [0,T], and intermediate times C [t,T]. XTien 

(5) PP(i,p;(^,G 4 )r =0 ,T) = {xeR d \ V(t,x) > p}. 
Moreover, suppose Assumption 3.4 holds. Then, 

(6) W(t, m {W u G i )^ ,{T i )^ 1 ) = {xeW i | V(t,x)>p}, 
where the value functions V and V are as defined in (4) . 

Proof. See Appendix A. □ 

Remark 3.7 (Mixed Motion-Planning Events). In this section we focus on two sets of events as 
introduced in Definition 2.1, and will continue doing so for our subsequent results. However, it 
is of interest to consider an event that consists of a mixture of the events in (1), e.g., (Wi 

Gi)<Ti ° (W2 G2). One can observe that essentially the same analytical techniques as the 
ones proposed here can be employed to address these mixed motion planning objectives, and 
establish a connection to a class of optimal control problems with some appropriate sequential 
stopping times. We shall provide an example of this nature in §6. 



4. Dynamic Programming Principle 

The objective of this section is to derive a DPP for the value functions V and V introduced 
in (4). We proceed with a more abstract setting for transparent elucidation in the following 
fashion: Let (Tj)" =1 C [0, T] be a sequence of times, (Ai)f_ 1 C fB(R") be a family of open sets, 
and payoff functions li : W — > K that are measurable and bounded, i = 1, . . . , n. We define the 
sequence of value functions Vk ■ [0, T] x R d — >• R d , k = 1, . . . , n, 

n 

(7) V k (t,x) := sup E[j]^(X*f M )l, T?(t,x):=ef k -"(t,x)AT i , ie{k,---,n], 
ueu t ^ i=k * 

where the stopping times { < S>Y h n ) 1 i=k are sequential exit-times in the sense of Definition 3.1. 
Notice that the sequential exit-times of the value function Vk correspond to an excursion through 
the sets {Ai)^_ k irrespective of the first (k — 1) sets. It is straightforward to observe that the 
value functions V and V in (4) are particular cases of the value function V± defined as in (7) for 
an appropriate selection of the sets (Ai)™ =1 , functions (^i)™ =1 , and intermediate times (Ji)" =1 . 

To state the main result of this section, Theorem 4.5 below, some technical definitions and 
assumptions concerning the stochastic processes X t,x ' u , admissible strategies Ut, and the payoff 
functions £i, are needed: Let A be a metric space, and let / : A — > R be a function. The lower 
and upper semicontinuous envelopes, respectively, of / are defined as: 

:= liminf f{x') f*(x) := limsup f{x'). 

We denote by USC(A) and LSC(A) the collection of all upper-semicontinuous and lower-semicontinuous 
functions from A to K, respectively. 

Assumption 4.1. We stipulate the following assumptions on 

a. Admissible control policies: 

i. Non-anticipative policy: Given a control set U C K d ", any u £l4 t is U-valued W t -adapted 
stochastic process; 
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ii. Stability under concatenation: The set of admissible control policies at time t, lit, is 
closed under concatenation. That is, for all 141,1*2 € Ut and stopping time 6 £ 7[t,T]> it 
holds that 

l[t,0]«l + 1]0,T]«2 G U t ; 

b. Stochastic process X.' x ' n : 

i. Causality: For all initial conditions (i, cc) £ S, any control policies Ui,M2 S Ut, and 
stopping time 6 £ T[t,T\> $ holds that 

l[t,0]«l = 1[ M ]«2 =>■ l[ t ,0]X t '^ Wl - 1 [M ]X M;U2 

n. Strong Markov property: Given initial condition (t,x) £ S, control policy u € Ut, and 
stopping time 9 £ 7[t,T]> f or a ^ bounded measurable functions I : K™ — > M and s > it 
holds that 



E 



To 



E 



v t.x\u 

An' 



P-O.S. 



to. Continuity of exit-time: Given initial condition (to, xq) £ S and control policy u £Ut, for 
all k £ {1, • • • , n} and i £ {k, ■■ ■ ,n} the stochastic mapping (t,x) 1— ¥ X^f^ 1 ^ is V-a.s. 

continuous at the point (to,xo) where the stopping time r* is defined as in (7); 
c. Payoff functions: 

(£i)2 = i are lower semicontinuous, i.e., li £ LSC(M d ) for all % < n. 

Remark 4.2. Some remarks on the above assumptions are in order: 

o Assumption 4.1.a.i. implies that admissible strategies u £lA t take action at time t independent 
of future information arriving at s > t. This is known as a non-anticipative strategy [Bor05], 
and is a standard assumption, 
o Assumption 4.1.a.ii. is also standard for dynamic programming arguments and it holds for a 

large class of admissible strategies, e.g., progressively measurable processes, 
o Assumption 4.1.b. imposes three constraints on the process X t,x ' u defined on the prescribed 
probability space: i) causality of the solution processes for a given admissible policy ii) strong 
Markov property iii) continuity of exit-time. The causality property is always satisfied in 
practical applications; uniqueness of the solution process X. ' x ' u under any admissible control 
process it guarantees it. The class of Markovian processes is fairly large; for instance, it 
contains the solution of SDEs under some mild assumptions on the drift and diffusion terms 
[Kry09, Theorem 2.9.4]. The almost sure continuity of the exit-time with respect to the initial 
condition of the process is the only restrictive. Note that this condition does not always 
hold even for deterministic processes with continuous trajectories. One may need to impose 
conditions on the process and possibly the sets involved in motion planning in order to satisfy 
continuity of the mapping (t, x) 1— >■ X^f^^ at the given initial condition with probability one. 



We shall elaborate on this issue and its ramifications to a class of diffusions in 



We present two Lemmas preparatory to our main result. For k — 1, ... ,n, we define the 
function Jk : § x U — s- M as: 

n 

(8) J fc (M;^:=E[nM<p 

i—k 

where ( T i £ )™_ A . ar e as defined in (7). 

Lemma 4.3. Consider the value function Vk and Jk as defined in (7) and (8) respectively. 
Then, for any k £ {1, • • • , n} 

Vk{t,x) = sup Jk(t,x;u), V(t,i)sS. 
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Proof. Note that all the admissible policies u G Ut are also contained in U. Therefore, the 
inequality " > " between the left- and right-hand sides is immediate. In order to show the 
reverse inequality, observe that any control policy u G hi is a measurable function of the process 
(z s ) s > . For any fixed (z s ) < s <t it holds that u := u((z s ) < s < T ) = u((z s ) < s < t , (z s - z t ) t > s > T ) , 
where u can be viewed as a policy independent of J~t due to independent increments of z. By 
[BT11, Remark 5.2] the inequality " < " also follows, and the assertion of the lemma follows. 6 □ 

Lemma 4.4. Under Assumptions 4-1-b.iii. and 4-. I.e., the function S 9 (i, x) i— >• Jfe(t, x; u) G ffi 
is lower semicontinuous for all k G {1, • • • , n} and control policy u G U. 



Proof. Fix k G {1, 
Therefore, 



liminf J k {s,y;u) 

(s,y)^(t,x) 



i}. It is obvious that the function J k is uniformly bounded since l k are. 



lim inf E 

{s.y)^(t,x) 



n 



Y s,y:u \ 
X - k (s,y)) 



n 

>E[ liminf TT^U' 5 



y.u \ 



i—k ' ' i—k 

where the inequality in the first line follows from the Fatou's lemma, and the second inequality 
in the second line is a direct consequence of Assumptions 4.1.b.iii. and 4. I.e. □ 



The following Theorem, the main result of this section, establishes a dynamic programming 
argument for the value function Vjt in terms of the "successor" value functions (Vj')?=fc+i> all 
defined as in (7). For ease of notation we shall introduce deterministic times T k _ 1 ,r^ +l7 and a 
trivial constant value function V n+ i. 

Theorem 4.5. Consider the value functions (V^)™ =1 and the sequential stopping times (rj 8 )™^ 
introduced in (7). Under Assumption ^.1, for all initial conditions (t,x) G §, stopping times 
6 G T[t.T], k G {1, • • • , n} we have 

-n+i j-i 



(9a) 
(9b) 



V k (t,x) < sup E 
ueu t 



V k (t,x) > sup E 
ueu t 



n+l 



J'-1 



j—k i—k 

where Vj* and Vj t are upper and lower semicontinuous envelopes of the value function Vj, 
respectively, t^_ 1 '■— t, V n +i = 1, and r* +1 can be chosen any constant time strictly greater than 
T, sayT* +1 :=T+1. 



Proof. The proof extends the main result of our earlier work [MECL12, Theorem 4.7] on the 
so-called "reach-avoid" motion planning maneuver. Based on the tower property of conditional 
expectation [Kal97, Theorem 5.1], we have 

n+l r n -| j—1 



E 



i—k 



To 



(10a) 



j — k i—j i—k 

n+l j-1 

j — k i—k 



°For a similar result in the context of SDEs, see [Kry09, Theorem 3.1.7, p. 132]. 
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(10b) 



n+1 j-l 
j — k i—k 



where (10a) and (10b) follow from Assumption 4.1.b.ii., and Lemma 3.2, respectively. In the 
light of the tower property of conditional expectation [Kal97, Theorem 5.1], arbitrariness of 
u €U t , and obvious inequality Vj < Vj*, we arrive at (9a). 

To prove (9b), we define a sequence of uniformly bounded upper semicontinuous functions 
(0j)™ =fc C USC(S) such that <fij < Vj t on §. Mimicking the ideas in the proof of our earlier work 
[MECL12, Theorem 4.7], it is possible to establish that given e > for all j g {k, ■ ■ ■ , n} there 
exists an admissible control policy tij such that 

(11) (t>j(t,x) - 3e < Jj(t,x;wj) V(t,x)e§. 

Let us fix u £ Ut and e > 0, and define 



(12) 



v e ■■= t [t .e]U + t 



?, ^iX! 1 {^-i<e<^}' u J' 

j=k 



where Uj satisfies (11). Notice that Assumption 4.1. a. guarantees v e g Ut- In accordance with 
(11), (12) and in light of Assumption 4.1.L, one obtains the tower property from (10a) as follows: 



V k {t,x) > J k (t,x;v e ) =E 



E 



E 



To 



i—k 

fl 

1 {-r$_ l <6<t- f } j j ( d > X T' U ; u j ) II ii ( X t? ' 1 



J-l 



E 



j=k 
n+1 



j=k 



Since Vj is lower semicontinuous, in view of [Ren99, Lemma 3.5] one may pick a sequence of 
increasing continuous functions (0J l ) me N that converges point-wise to Vj. By boundedness of 
(£j)i=i an d the dominated convergence Theorem, we get 

rn+l J-l 

j—k i—k 

Since u GUt and e > are arbitrary, this implies the assertion (9b) . □ 

Remark 4.6. Theorem 4.5 introduces DPP's in a weaker sense than the standard DPP in sto- 
chastic optimal control problems [FS06]. Namely, one does not need to verify the measurability 
of the value functions Vk in (4) so as to apply the DPP's. Notice that in general this measurabil- 
ity issue is non-trivial due to the supremum operation running over possibly uncountably many 
policies. 



5. Applications to Controlled Diffusions 



The objective of this section is to demonstrate how the weak DPP derived in §4 adapts to 
the context of controlled diffusion processes. This application results in a series of Hamilton- 
Javobi-Bcllman PDE's, where each PDE is understood in the discontinuous viscosity sense with 
some boundary conditions both in viscosity and Dirichlet (pointwise) senses. To this end, we 
shall first introduce formally the standard probability space setup for SDEs, then proceed with 
some preliminaries so as to pave the ground for the required Assumptions 4.1 to hold. The 
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section consists of subsections concerning PDE derivation and boundary conditions along with 
further discussions so as to deploy the existing PDE solvers to numerically compute our PDE 
characterization. 

Let fl be C([0, T],M Zti ) , the set of continuous functions from [0, T] into R Zd , and let (z t )t>o 
be the canonical process, i.e., z t (uj) ■= uj t . We consider P as the Wiener measure on the filtered 
probability space (CI, J 7 , F), where F is the smallest right continuous filtration on CI such that 
the process (z t )t>o is adapted to. Let us recall that F t := (J 7 t,s)s>o is the auxiliary subfiltration 
defined as J"t lS ■= <r(z r — z u t < r < f V s) 

Let U C M d " be a control set, and U t denote the set of all Ft- progressively measurable 
mappings into U. For every u = (ut)t>o we consider the following ]R d -valued SDE': 

(13) dX s = f(X s ,u s )ds + a{X s ,u s )dW s , X t =x, s>t, 

where / : M. d x U — > R" and a : R d x U — > R dxd * . are measurable maps, and W s :— z s is the 
canonical process. 

Assumption 5.1. We stipulate that 

a. U C R m is compact; 

b. f and a are continuous and Lipschitz in its first argument uniformly with respect to the 
second; 

c. The diffusion term a of the SDE (13) is uniformly non- degenerate, i.e., there exists 5 > 
such that for all x € R d and u £ U, ||er(7 T || > 5. 

It is well-known [Bor05] that under Assumptions 5.1. a. and 5.1.b. there exits a unique strong 
solution to the SDE (13); let us denote it by (X*' a:; '") s>t . For future notational simplicity, 

we slightly modify the definition of Xl :X ' ,u , and extend it to the whole interval [0,T] where 
Xl' x ' u := x for all s in [0,*]. 

In addition to Assumptions 5.1 on the SDE (13), we impose the following assumption on 
the motion planning sets that allows us to guarantee the continuity of sequential exit-times as 
required for the DPP obtained in the preceding section. 

Assumption 5.2 (Exterior Cone Condition). The open sets (Ai)i=i C <B(R d ) satisfy the fol- 
lowing condition: for every i £ {1, • • • ,n} 7 there are positive constants h, r an R d -value bounded 
map T) : A? R n such that 

B rt (x + T)(x)t) C A\ for all x G A1 and t e (0, h] 

where B r (a;) denotes an open ball centered at x and radius r and A\ stands for the complement 
of the set Ai . 

Remark 5.3. If the set Ai is bounded and its boundary dAi is smooth, then Assumption 5.2 
holds. Furthermore, boundaries with corners may also satisfy Assumption 5.2; Figure 5 depicts 
two different examples. 

5.1. Sequential Partial Differential Equations. This subsection establishes a connection 
between the DPP introduced in Theorem 4.5 and a sequence of PDEs, all of the latter are meant 
in the sense of discontinuous viscosity solutions; for the general theory of viscosity solutions 
we refer to [CIL92] and [FS06]. For numerical solutions to these PDEs, one also needs some 
boundary conditions, and that will be the objective of the next subsection. 



We slightly abuse notation and earlier used a for the sigma algebra as well. However, it will be always clear 
from the context to which a we refer. 
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(a) Exterior cone condition holds at every (b) exterior cone condition fails at 

point of the boundary. the point x — the only possible ex- 

terior cone at a; is a line. 



Figure 5. Exterior cone condition of the boundary 

To apply the proposed DPP, one has to make sure that Assumptions 4.1 are satisfied. As 
pointed out in Remark 4.2, the only nontrivial assumption in the context of SDEs is Assumption 
4.1.b.iii. The following proposition addresses this issue, and allows us to employ the DPP of 
Theorem 4.5 for the main result of this subsection. 

Proposition 5.4. Consider the SDE (13) where Assumptions 5.1 holds. Suppose that the open 
sets (^4«)™=i C 25(R rf ) satisfy the exterior cone condition in Assumption 5.2. Let (@ i be 
the respective sequential exit-times as defined in Definition 3.1. Given intermediate times (Tj)" =1 
and control policy u £ hit, for any i € {1. • •• ,n}, initial condition (t,x) <E S, and sequence of 
initial conditions (t m ,x m ) — > (t,x), we have 

lim Ti(t m ,x m ) — n(t,x) P-a.s., Tj(i, x) := Qf Un (t, x) A T t . 

m— ¥ oo 

Moreover, one can show that the above result readily leads to the continuity of the stochastic 
mapping (t,x) <— > X ,a !i U \ with probability one, i.e., lim x tm ,f m ' u , = X ,X A V \ P-a.s. for all i. 

Proof. See Appendix B. □ 

Definition 5.5 (Dynkin Operator). Given hgU, we denote by C u the Dynkin operator (also 
known as the infinitesimal generator) associated to the controlled diffusion (13) as 

£ u <f>{t,x) := d t $(t,x) + f(x,u).d x $(t,x) + ^Tr[aa T (x,u)d^(t,x)}, 

where $ is a real-valued function smooth on the interior of S, with <9t$ and d x <& denoting the 
partial derivatives with respect to t and x respectively, and 9^$ denoting the Hessian matrix 
with respect to x. We refer to [Kal97, Theorem 17.23] for more details on the above differential 
operator. 

Theorem 5.6 is the main result of this subsection, which provides a characterization of the 
value functions Vk in terms of Dynkin operator in Definition 5.5 in the interior of the set of 
interest, i.e., [0,Tk[xAk- This result is a direct consequence of the DPP in Theorem 4.5, and 
Ito's formula; for similar technique we refer to the proof of [MECL12, Theorem 4.10]. 

Theorem 5.6. Consider the system (13), and suppose that Assumptions 5.1 hold. Let the value 
functions Vk ■ § — > K d , k = l,...,n be as defined in (7), where the sets (Ai)™ =1 satisfy the 
Assumption 5.2, and the payoff functions (^i)™ =1 are all lower semicontinuous. Then, 



MOTION PLANNING VIA OPTIMAL CONTROL FOR STOCHASTIC PROCESSES 



15 



o the lower semicontinuous envelope of V k is a viscosity super solution of 

-sup£ u V k *{t,x)>0 on [0,T k [xA k ; 

o the upper semicontinuous envelope of V is a viscosity subsolution of 

-sup£ u V k *(t,x)<0 on [0,T k [xA k . 

u<£V 

Proof. We refer to Appendix B for a sketch of proof and [MECL12, Theorem 4.10] for a detailed 
analysis of the same technique. □ 

5.2. Boundary Conditions. To numerically solve the PDE characterization of the previous 
part, one needs some boundary value conditions on the complement set of the PDE, which is 
addressed in the following: 

Proposition 5.7. Suppose that the hypotheses of Theorem 5.6 hold. Then the value functions 
V k introduced in (7) satisfy the following boundary value conditions: 

(14a) V k (t,x) = V k+1 (t,x)£ k (x) on [0, T k ] x A% \J{T k } x R d 

(14b) V k *(t,x)<V k * +1 (t,x)£* k (x) on [0, T k ] x A% \J{T k } x R d 

Proof. See Appendix B, along with a preparatory lemma. □ 

Proposition 5.7 provides boundary conditions for the value function V k not only in Dirichlct 
(pointwise) (14a) sense but also in the discontinuous viscosity sense (14b). Observe that the 
value functions V k are all lower semicontinuous since each of them is the supremum over family of 
lower semicontinuous functions J kl see Lemma B.l. It then follows that the pointwise boundary 
condition (14a) indicates the other side of viscosity boundary condition of (14b), i.e., V kif (t,x) > 
V k+U (t,x)£ k ,(x). 

5.3. Discussion on Numerical Issues. The objective of the current section was to propose 
an alternative characterization of the value functions V k : S — > K. in (7) for the case of controlled 
diffusion processes governed via the SDE (13). To this end, §5.1 developed a PDE characteriza- 
tion of the value function V k within the set [0,T k [xA k along with some boundary conditions in 
terms of the successor value function V k+ i provided in §5.2. Given value function V k +i, to obtain 
the value function V k , the only non-trivial set of initial conditions becomes [0, T^xA^ for which 
it is required to solve a certain PDE with some boundary (possibly both terminal and lateral) 
conditions. Since V n +i = 1, one can infer that Theorem 5.6 and Proposition 5.7 suggest a series 
of PDE equations for which the first one has known boundary condition £ ni while the boundary 
conditions of the subsequent steps are determined by the solution of the preceding PDE step, 
i.e., V k+ i provides boundary conditions for the PDE corresponding to the value function V k . 
Let us highlight once again that the basic motion planning maneuver involving only two sets is 
effectively the same as the first step of this series of PDEs and was studied in our earlier work 
[MECL12]. 

Before proceeding to apply the PDE characterization of this section to obtain the desired 
motion-planning initial sets introduced in Definition 2.6, we need to properly justify the following 
two concerns: 

(i) On the one hand, for the definition (2a) we need to assume that the goal set Gi is closed so 
as to allow continuous transition into Gf, see Assumption 2.2 and the discussion preceding 
it. On the other hand, in order to invoke the DPP argument of §4 and its consequent PDE 
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in §5.1, we need to impose that the payoff functions are all lower semicontinuous; 

see Assumption 4. I.e. In the case of the value function V in (4a), this constraint results 
in (Gi)f =1 all being open which is obviously in contradiction to our earlier assumption in 
accordance to (2a). 

(ii) Due to the numerical issues, most of existing PDE solvers provide theoretical guarantees 
for continuous viscosity solutions, e.g., [Mit05], whereas our characterization is indeed 
in discontinuous form. Therefore, it is a natural question whether we could employ the 
existing off-the-shelf toolbox to numerically calculate our desired value function. 

Let us initially highlight the following points: Concerning (i) it should be mentioned that this 
contradiction is not applicable for the motion-planning initial set (2b) since the goal set Gi can 
be simply chosen to be open without confining the continuous transitions. Concerning (ii), we 
would like to stress that this discontinuous formulation is inevitable since the value functions 
defined in (4) are in general discontinuous, and any PDE approach has to rely on discontinuous 
versions. 

Moreover, we next propose an e-conservative but precise way of characterizing the motion- 
planning initial set of (2a) as well as justifying the numerical concern in (ii) to employ the 
existing off-the-shelf PDE solvers. Given (Wi, Gj) € <B(R d ) x 93 (R d ), let us construct a smaller 
goal set G\ c Gi such that G\ := {x E Gj| dist(x, Gf) > e}. 8 For sufficiently small e > one 
may observe that Wi \ G\ satisfies Assumption 5.2. Note that this is always possible if Wi \ Gi 
satisfies Assumption 5.2 since one can simply take e < h/2, where h is as defined in Assumption 
5.2. Figure 6 depicts this situation. 




FIGURE 6. Construction of the sets G- from Gi as described in §5.3 
Formally we define the payoff function I \ : M. d — > K as follows: 

et( x ) := (i- dist( ^ a) )vo. 

Replacing the goal sets G\ and payoff functions i\ in (4a), we arrive at the value functions 



V e (t,x) := sup E 
ueu t 



U e i( x T 



Ti\ := 9f-" AT, B\:=Wi\ G\ 



It is straightforward to inspect that V e < V since the goal sets are smaller with respect to 
the actual goal sets Gi. Moreover, one can show that V(t,x) = lim^o V e (t, x) on the set 
(t,x) € [t,T[xM. d , which indicates that the approximation scheme can be arbitrarily precise. 
Note that the approximated payoff functions l\ are, by construction, Lipschitz continuous that 
in light of uniform continuity of the process, Lemma B.l, yields to the continuity of the value 
function V t . Hence, the discontinuous PDE characterization of §5.1 is simplified to continuous 



dist(x, A) := infyg ^ \\x — y\\, where || ■ || stands for the Euclidean norm. 
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regime, for which there exist PDE solvers for numerical computations. We refer to [MECL12, 
Section 5] for a detailed analysis of the proposed approximation scheme. 

6. Numerical Example: Chemical Langevin Equation for a Biological Switch 

In this section we apply the theoretical results in the preceding sections to a biological switch 
network. Multistablc biological systems are often seen in nature [BSS01]. The inherent stochas- 
ticity in these systems can be substantial — they influence convergence, or may even lead to 
switching behavior from one equilibrium to another. The pioneer works on modeling reactions 
in biochemical network are based on countable state Markov chains, which describe the evolu- 
tion of molecular numbers. Due to the Markov property of chemical reactions, one can track the 
time evolution of the probability distribution for molecular populations as a family of ordinary 
differential equations called the chemical master equation (CME) [AGA09, ESKPG05]; it is also 
known as the forward Kolmogorov equation. 

An SDE approach toward modeling the stochastic molecular numbers of species has been 
proposed in the literature. It is, of course, assumed that a molecular species number may take 
non-integer numbers. This assumption is usually reasonable for large molecular populations, 
whereas for low copy numbers, it may not be reliable — a small number, say, 0.1 protein copies can 
lead to an entirely different dynamic behavior than one would observe from exactly copies. The 
time-continuity of stochastic processes makes the analysis significantly more tractable than the 
CME. In this method, one can approximate the molecular numbers via continuous time Markov 
process, where the latter is an approximation of the jump Markov process that underlines the 
CME. This stochastic continuous-time approximation is called the chemical Langevin equation or 
the diffusion approximation, see [Kha] and the reference therein for further details on modeling 
and analysis of stochastic biochemical networks. 

Another difficulty with the diffusion approximation is that the model is typically not well- 
posed since it may assign a negative number to a molecular species. Nevertheless, this issue can 
be neglected when the focus of observation is away from low numbers of each species. In this 
section we consider the following chemical Langevin formulation of a two gene network: 

(15) (dX t = (f(Y t ,u x ) - » x X t )dt + ^f{Y u u x ) AW} + ^X t AW^ X = x 
\dY t = (g(X t ,u y ) - » y Y t )dt + ^g(X t ,u y )dWi + ^JxyY t dW^ Y =y 

where X t and Y t are the concentration of the two repressor proteins with the respective degrada- 
tion rates ji x and fi y ; (W^)t> are independent standard Brownian motion processes. Functions 
/ and g are repression functions that describe the impact of each protein on the other's rate of 
synthesis controlled via some external inputs u x and u y . 

In the absence of exogenous control signals, the authors of [ChcOO] study sufficient conditions 
on the drifts / and g under which the system dynamic (15) without the diffusion term has 
two (or more) stable equilibria. In this case, system (15) can be viewed as a biological switch 
network. The aforementioned theoretical results are also experimentally investigated in [GCC00] 
for a genetic toggle switch in Escherichia coli. 

In this article we consider the biological switch dynamics such that the degradation rates of 
proteins arc influenced by some external control signals. The practical feasibility of this assump- 
tion has successfully been experimented in [MSSO + ll]. The level of repression is described by 
a Hill function, which models cooperativity of binding as follows: 

f) ni f) 712 k« 
t(ij,u) :— T^rrU, g(x,u) := -xr-u, 
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where 8i are the threshold of the production rate with respective exponents rij, and ki are the 
production scaling factors. The parameter u represents the role of external signals that affect the 
production rates, for which the control sets are := [u x ,Ua;] arid U y := [u y ,u y ] with nominal 
value u := 1. As explained in [CheOO], the nullclines of the system are and g ^ , which 

determine whether the system has multiple stable equilibria. In this example we consider system 
(15) with the following parameters: 6^ = 40, fa = 0.04, fej = 4 for both i £ {1, 2}, and exponents 
ni = 4, n 2 = 6. Figure 7(a) depicts the drift nullclines and the equilibria of the system. The 
equilibria z a and z c are stable, while zt, is the unstable one. We should remark that the "stable 
equilibrium" of SDE (15) is understood in the absence of the diffusion term, as the noise may 
very well push the states from one stable equilibrium to another. 




x x 

(a) Nullclines and equilibria of the drift of the SDE (b) The set A is an avoidance region contained in the 
(15); x a and x^ are stable, and x c is unstable. region of attraction of the stable equilibria x a and x c , 

B is the target set around the unstable equilibrium 
Xj,, and C is the maintenance margin. 

Figure 7. State space of the biological switch (15) with desired motion plan- 
ning sets. 

In this simulation, we first aim to steer the number of proteins towards a target set around 
the unstable equilibrium by synthesizing appropriate input signals u x and u y within a certain 
time horizon, say T-y. During this task we opt to avoid the region of attraction of the stable 
equilibria as well as low numbers for each protein; the latter justifies our model being well-posed 
in the region of interest. The aforementioned target and avoid sets are denoted, respectively, by 
the closed sets B and A in Figure 7(b). In the second phase of the task, once the trajectory 
visits the target set B, it is required to keep the molecular populations within a slightly larger 
margin around equilibria for some time, say T2; Figure 7(b) depicts this maintenance margin by 
the open set C. In the context of reachability, the second phase is known as viability [Aub91]; 
for an almost-sure stochastic counterpart see for instance [AD90, AP98]. 

Let us highlight two technical points here: (i) set C must be chosen strictly larger than B 
in order to allow the process to maneouver inside the interior of the maintenance set C once 
it hits the target set B. From a theoretical standpoint, this is necessary since the probability 
of hitting set B and remaining inside B for all future has zero probability; this is due to the 
non-degenerate property of SDE (15), see the proof of Proposition 5.4 for a rigorous analysis of 
this issue, (ii) Since the process is non-degenerate, in principle the noise can push the process 
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anywhere in the long-run with positive probability. This fact indicates that the probability of 
success in the second phase of the motion planning task decreases with the time horizon T 2 , and 
it tends to zero as T 2 goes to 00. 

Therefore, the motion planning consists of two parts: reaching the target set B while avoiding 
the set A within the certain horizon T\, and staying in the set C for the certain time T 2 after 
visiting the set B for the first time. In view of motion-planning events introduced in Definition 
2.1, the first phase of the path can be expressed as (A c B)<Xii an d the second phase as 

(C — t> C); see (1) for detailed definitions of these symbols. By defining the joint process z t,z ' u :— 
rj£t,x;u.^ y-t,i/;-uj ^ initial condition z := (x, y), the desired excursion is a combination of the 

events studied in the preceding sections and, with a slight abuse of notation, can be expressed 
by 



|= (A c ^ B)< Tl o(cAc)}. 



The above event depends, of course, on the initial condition (i, x) and control policy it, and the 
objective is to maximize its probability over all admissible policies u := [iij^Uj]. The desired 
path is not exactly in the framework of Definition 2.1 but, nonetheless, one can invoke the same 
ideas as in §3 and introduce the following value functions: 

(16a) Vi(t,z) := sup E[l B (Z^ ;u )l C (^f ") 

ueu t 1 1 2 



(16b) V 2 (t,z) := sup E 

ueu t 



where t\ and r| are defined in a same spirit as (7) with given sets A\ := (A U B) c and A2 := C. 
However, the stopping time t\ requires a slight modification so as to address the combination 
of both motion-planning events introduced in Definition 2.1: r\ := 9^ 12 A (rj + T 2 ). 

The solution of our motion planning objective is the value function V\ in (16a), which in view 
of Theorem 5.6 is characterized by the Dynkin differential operator in the interior of [0, Ti[x(AU 
B) c . However, we need first to solve numerically for the auxiliary value function V2 in (16b) in 
order to provide boundary conditions for the PDE corresponding to V± by 

(17) V 1 (t,z) = t B (z)V 2 (t,z), (t,x)e[0,T 1 ]x(AuB)\J{T 1 }xR n . 

It is straightforward to observe that the boundary condition for the value function V 2 is 

V 2 (t,z) = t c (z), (t,x) G [0,Tx+T 2 ] x C C U{T! +T 2 } x K'\ 

Therefore, we need to solve the PDE of V 2 with the above boundary condition backward from 
the time T\ + T 2 to the time T\ , and then at time T\ restrict the value function V 2 onto the 
set B to provide boundary conditions for the value function V\. Thus, the value function V\ 
can be computed via solving the same PDE from T\ to but along with different boundary 
conditions provided by the preceding step. According to Definition 5.5 for any smooth function 
4> '■= 4>(t, x, y) the Dynkin operator C u can be simplified to 



SMp£ u (/)(t,x,y) 

uev 



max 

«6ii 



d t (j> + d x cf)(f(y, u x ) - n x x) + d y <fr(g(x, u y ) - n y y) 



\d 2 x <j){f{y, u x ) + fi x x) + ^d^(j)(g(x, u y ) + fi y y) 

l„o,s ,„ , 1 



d t <t> - {d x cj> - -d x 4>)fi x x - (d y 4> - -d 2 y 4>)vyy 
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+ max_ [f(y,u x )(d x <f>+ -d 2 x cf>)} 



■ max_ [g(x, u y )(d y <j) + -dfa)] . 
u y e[u v ,u y ] I 



On account of Theorem 5.6 and linearity of the drift terms in u, one can propose an optimal 
policy in terms of derivatives of the value functions V\ and Va in (16), respectively, for the first 
and second phase of the motion: 



u* x (t,x,y) 



K(t,x,y) 



u x (t,x,y) 
^Jt,x,y) 

u y (t,x,y) 
u y (t,x,y) 



if d x V l {t 1 x 1 y) + \dlV l {t 1 x,y) > 0, 

if d x Vi{t,x,y) + ^Vi{t,x,y) < 0, 

if d y V i (t,x,y) + ld*V i (t,x,y)>0, 

if d y V i (t,x,y) + ld^V i (t,x,y)<0, 



where i £ {1,2} corresponds to the phase of the path. 




(a) V2 in case of full controllability over both produc- (b) Vi in case of half controllability over only the 
tion rates. production rate of protein x. 

Figure 8. The value function V 2 as defined in (16b) corresponding to proba- 
bility of staying in C for 120 time units. 

In the sequel we investigate two scenarios: first, when full control over both production rates is 
possible, i.e., u x = u = and u x — u y = 2; second, when we only have access to the production 
rate of protein x, i.e., u y — u y — u. Figure 8 depicts the probability distribution of staying in set 
C within the time horizon T2 = 120 time units 9 in terms of the initial conditions (x, y) EM 2 . V2 
is zero outside set C, as the process has obviously left C if it starts outside it. Figures 8(a) and 
8(b) demonstrate the first and second control scenarios, respectively. Note that in the second 
case the probability of success dramatically decreases in comparison with the first. This result 
indicates the importance of full controllability of the production rates for the achievement of the 
given control objective. 

Figure 9 depicts the probability of successively reaching set B within the time horizon T\ = 60 
time units and staying in set C for T 2 = 120 time units thereafter. Since the objective is to 
avoid set A, the value function V\ takes zero value on A. Figures 9(a) and 9(b) demonstrate the 
first and second control scenarios, respectively. It is easy to observe the non-smooth behavior 
of the value function V\ on the boundary of set B in Figure 9(b). This is indeed a consequence 
of the boundary condition explained in (17). All simulations in this subsection were obtained 
using the Level Set Method Toolbox [Mit05] (version 1.1), with a grid 121 x 121 in the region of 
simulation. 



^Notice that the half-life of each protein is assumed to be 17.32 time units 
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T, =60 T. = 60 




Y X y X 

(a) Vi in case of full controllability over the produc- (b) Vi in case of inability to increase the production 
tion rates. rate of protein y. 



Figure 9. The value function V\ as denned in (16a) corresponding to proba- 
bility of staying in C for 120 time units, once it reaches B while avoiding A 
within 60 time units. 

7. Conclusion and Future Directions 

In this article we introduced different notions of stochastic motion planning problems, with 
RCLL sample-paths realizations. Based on a class of stochastic optimal control problems, we 
proposed an alternative characterization of the set of initial conditions from which there exists an 
admissible policy to execute the desired maneuver with probability at least as much as some pre- 
specified value. We then established a weak DPP, which does not need to verify the measurability 
of the value functions, in terms of some auxiliary value functions. Subsequently, we focused on 
a case of diffusions as the solution of an SDE, and investigated the required conditions to apply 
the proposed DPP in the preceding section. It turned out that invoking the DPP one can solve 
a series of PDEs in a recursive fashion so as to numerically compute the desired initial set as 
well as admissible policy for the motion planning specifications. Finally, the performance of the 
proposed stochastic motion-planning notions was illustrated for a biological switch network. 

For future work, in light of Theorem 4.5 which is in fact developed for RCLL processes, we 
aim to study the required conditions of the proposed DPP, Assumptions 4.1, for a larger class 
of stochastic processes, e.g., controlled Markov jump-diffusions. 

Appendix A. 

This appendix collects the missing proofs of the results presented in §3. 
Proof of Proposition 3. 6. We first show (5) . Observe that it suffices to prove that 

n 

(18) {.v:'- : " |= [(Wi ~» Gx) o • • • o (W n - G„)] < T } = f) {x^' u e Gjj 

i=l 

for all initial conditions (t,x) and policies u, where the stopping time r\i is as defined in (4a). 
Let uj belong to the left-hand side of (18). In view of the definition (la), there exists a set of 
instants (si)f =1 C [t,T] such that for all i, A*f ;u (w) € G» while X$< x '< u (w) ef,\G, =: B { 
for all r G [sj_i,Sj[> where we set sq = t. It also follows by an induction argument that 
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i]i(io) — Q? 1:n — Si, which immediately leads to X ,x /£.((jj) G Gi for all i < n. This proves the 
relation " C " between the left- and right-hand sides of (18). Now suppose that uj belongs to the 
right-hand side of (18). Then, for all i < n we have X ' x /^(u}) G Gi. In view of the definition of 
stopping times rji in (4a), it follows that X*' x ' u (lj) g £>j :— W t \ Gi for all r G [rji-i(ui), r)i(w)[. 
Introducing the time sequence Sj := r]i(ui) implies the relation " D " between the left- and 
right-hand sides of (18). Together with preceding argument, this implies (18). 

To prove (6) we only need to show that 

n 

(19) {x^ u \= (W x A Gi) o ■ o (W n ^ G„)} = f| [xh*> u G Gi n W-} 

i=i 

for all initial conditions (t, x) and policies it, where the stopping time rji is introduced in (4b). 
To this end, let us fix (t,x) G S and u £ Ut, and assume that ui belongs to the left-hand side 
of (19). By definition (lb), for all i < n we have X^ u {uj) G G l and X*.' x ' u (oj) G W t for 
all r G [Tj_i,T,-]. By a straightforward induction, we see that f}i(uj) — Tj, and consequently 
X~'^(w) G Gi <~)Wi for all i < n. This establishes the relation " C " between the left- and 
right-hand sides of (19). Now suppose uj belongs to the right-hand side of (19). Then, for all 
i < n we have X~' x /^(u)) G GiHWi. By virtue of Fact 3.5 and an induction argument once again, 
Assumption 3.4 guarantees that rji(uj) = Tf, and consequently it follows that Xj, x ' u (ui) G Gj 
and X*^ ;,lt (a;) G VF, for all r G [T^i,^]. This establishes the relation " D " in (18), and the 
assertion (19) follows. □ 



Appendix B. 



This appendix contains missing proofs of §5. 



Proof of Proposition (5.4). The key step in the proof relies on the two Assumptions 5. I.e. and 
5.2. There is a classical result on non-degenerate diffusion processes indicating that if the process 
starts from the tip of a cone, then it enters the cone with probability one [RB98, Corollary 3.2, p. 
65] . This hints at the possibility that the aforementioned Assumptions together with almost sure 
continuity of the strong solution of the SDE (13) result in the continuity of sequential exit-times 
n and consequently Tj. In the following we shall formally work around this idea. 

Let us assume that t m < t for notational simplicity, but one can effectively follow similar 
arguments for t m > t. By the definition of the SDE (13), 



X ,, 



= x 



,u s 



*(xt» 



s)dW s 



P-a.s. 



By virtue of [Kry09, Theorem 2.5.9, p. 83], for all q > 1 we have 



E 



sup \\X r 

re[t,T] 



x~ 



I 2 " 



< Cx{q,T,K)E 



X 



< 2 2q - 1 C 1 (q,T,K)'E 
whence, in light of [Kry09, Corollary 2.5.12, p. 86], we get 



i 2 '/ 



1 2 5 



x 



{ m ,a: m ;u||2g 



(20) 



E 



sup \\X r 

r£[t,T] 



Xj x 



1 29 



<C 2 ( q ,T,K,\\x\\)(\\x-x n \\ 2 i + \t-t n \i). 



In the above inequalities, K is the Lipschitz constant of / and a mentioned in Assumption 5.1.b.; 
Gi and C2 are constant depending on the indicated parameters. Hence, in view of Kolmogorov's 
continuity criterion [Pro05, Corollary 1 Chap. IV, p. 220], one may consider a version of the 
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stochastic process X t,x,u which is continuous in (t, x) in the topology of uniform convergence on 
compacts. This leads to the fact that P-a.s, for any e > 0, for all sufficiently large m, 

(21) z W m ;« eBe (x*°> a: °;") ) Vre[i m ,T], 

where B e (y) denotes the ball centered at y and radius e. For simplicity, let us define the shorthand 
t™ := Ti(t m ,x m ). w By the definition of 7^ and Definition 3.1, since the set Ai is open, we 
conclude that 

(22) 3e>0, (J B £ (X*°' X ° ; ") n A\ = P-a.s. 

»£[n D -i< T i°[ 

By definition Tq := Trj(to,:Eo) = to- As an induction hypothesis, let us assume t^ } _ 1 is P-a.s. 
continuous, and we proceed with the induction step. One can deduce that (22) together with 
(21) implies that P-a.s. for all sufficiently large m, 

l r w " iM e4 Vre[t m) rP[. 

In conjunction with P-a.s. continuity of sample paths, this immediately leads to 

(23) liminf r™ 1 := liminf Ti(t m , x m ) > Ti(to, xq) P-a.s. 

m— >oo m— too 

On the other hand, as mentioned earlier, the Assumptions 5. I.e. and 5.2 imply that the set of 
sample paths that hit the boundary of Ai and do not enter the set is negligible [RB98, Corollary 
3.2, p. 65]. Hence 

V<5>0, 3se[Qf^(t 0> xo),Qt' n (t ,x )+5l xl ' x °' u G P-a.s., 

where (j4|)° denotes the interior of the set A\. Hence, in light of (21), P-a.s. there exists e > 0, 
possibly depending on <5, such that for all sufficiently large m we have 

x t m , Xm -, u e b^x* '* 05 ") c A c t 

Recalling the induction hypothesis, we note that in accordance with the definition of sequen- 
tial stopping times 6 i lm , one can infer that <df l n (t m , x m ) < s < Of l rl (t Ql x ) + 6. From 
arbitrariness of S and the definition of Ti, this leads to 

limsupTj(£ m ,a: m ) := limsup ( &f 1: " (t m , x m ) AT, ; J < Ti(t ,x ) P-a.s., 

tn— >oo m— >oo ^ ' 

where in conjunction with (23), P-a.s. continuity of the map (t, x) 1— > Ti(t,x) at (to, 2^0) for any 
i G {1, • • • , n} follows. The assertion follows by induction. 

The continuity of the mapping (t, x) 1— > X r ' x A u x ^ follows immediately from the almost sure 
continuity of the stopping time Ti(t,x) in conjunction with the almost sure continuity of the 
version of the stochastic process X. ' x ' u in (i, x); for the latter let us note again that Kolmogorov's 
continuity criterion guarantees the existence of such a version in light of (20). □ 

Proof of Theorem 5.6. Here we briefly sketch the proof in words, and refer the reader to [MECL12, 
Theorem 4.10] for detailed arguments concerning the same technology to prove the assertion 
of the Theorem. Note that any F t -progressively measurable policy u 6 U t satisfies Assump- 
tions 4.1. a.. It is a classical result [0ksO3, Chap. 7] that the strong solution X t,x,u satisfies 
Assumptions 4.1.b.i. and 4.1.b.ii. Furthermore, Proposition 5.4 together with almost sure path- 
continuity of the strong solution guarantees Assumption 4.1.b.iii. Hence, having met all the 
required assumptions of Theorem 4.5, one can employ the DPP (9). Namely, to establish the 
assertion concerning the supersolution, for the sake of contradiction, one can assume that there 
exists (toi^o) G [0,Tfe[xv4fe, and a smooth function <f> dominated by the value function 
where (Vk* — 4>)(t ,x ) = 0, such that for some 6 > 0, — sup ueIU C u 4>(t , x ) < — 25. Since 4> is 



This notation is only employed in this proof. 
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smooth, the map (t,x) i— > C u (/)(t,x) is continuous. Therefore, there exist tifl) and r > such 
that B r (< ,a;o) C [0,T k [xA k and —C u <j)(t,x) < —5 for all (t, a;) in B r (t ,xo)- Let us define the 
stopping time 9(t,x) as the first exit time of trajectory X t,x ' u from the ball T5 r (to, xq). Note 
that by continuity of solutions to (13), t < 9(t,x) P- a.s. for all (i, x) G B r (io,£o)- There- 
fore, selecting r > sufficiently small so that 8 < T k , and applying Ito's formula, we see that 
for all {t,x) G B r (t a ,xo), &(t,x) < E[<^(#(i, x),X e v£'^)]. Now it suffices to take a sequence 
(t m ,x m ,V k (t m ,x m )) m€ x converging to (t ,x 0l V k *(t 0l x )). For sufficiently large m we have 
V(t m ,x m ) < ^[Vk*(Q(t m ,x m ),X g T t '*™'2))] which, in view of the fact that 9(t m ,x m ) < r k AT fe , 
contradicts the DPP in (9a). The subsolution property is proved effectively in a similar fash- 
ion. □ 



In order to provide boundary conditions, in the discontinuous viscosity sense, for the PDE 
equation in Theorem 5.6 we need a preparatory preliminary lemma that contains a stronger 
assertion than Proposition 5.4. 

Lemma B.l. Suppose that the conditions of Proposition 5.4 hold. Given a sequence of control 
policies ()i m ) m ej C U and initial conditions (t m ,x m ) — > (t,x), we have 



lim 

m— too 



n{t m ,x m ) 



P-a.s., 



Ti (t,x) := @f 1:n (t, x) A Tj. 



Note that Lemma B.l is indeed a stronger statement than Proposition 5.4 as the desired 
continuity is required uniformly with respective to control policy. Let us highlight that the 
stopping times Ti(t, x) and Ti(t m , x m ) are both effected by control policies u m . But nonetheless, 
the mapping (i, x) i-> X^ x ' ,Um is almost surely continuous irrespective of the policies (w m )meN- 
For the proof we refer to an identical technique used in [MECL12, Lemma 4.11] 



Proof of Proposition 5.7. The boundary condition in (14a) is in a pointwise sense, and is an 
immediate consequence of the definition of the sequential exit-times introduced in Definition 
3.1. Namely, for any initial state x £ A c k we have Q^ k:n (t, x) = t, and in light of Lemma 3.2 for 



all i € {fc, 



,n} 

Qf^(t,x)=Qf 



'(t,aO 



V(t,a:) G [0,T fc ] xdA k \J{T k } 



This leads to X f' u = x on the boundaries, and yields to the pointwise boundary condition 
(14a). 

The boundary condition (14b) is in a discontinuous viscosity sense, and is a standard boundary 
condition in this said context. To prove the assertion, let (t m , x m , V k (t m , x m )j — > (t, x, V k * (t, x)j 
where t m < T k . Invoking the DPP in Theorem 4.5 once again and introducing := r k+1 in (9a), 
we arrive at 

V k (t m ,x m ) < sup Ek' +1 (r t i ,X'r' I »' i ' t )4te' l " ,it 

uGUt L k k 

Note that one can replace a sequence of policies in the above inequalities to attain the supremum 
running over all policies. This sequence, of course, depends on the initial condition (t m ,x m ). 
Hence, let us denote it via two indices (« mj )j e m. One can deduce that there exists a subsequence 
of (M mj )jgM such that 



V k *(t,x) = lim V k (t m , x m ) < lim lim E 



V k+l( T ki X . 



< lim E 



as / t 

04 (X 
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(24) 
(25) 



< E 



lira V k * +1 (r^X T 



= V k * +1 (t,x)t k (x) 



where (24) and (25) follow, respectively, from Fatou's lemma and the uniform continuity assertion 
in Lemma B.l. Let us recall that by Lemma B.l we know r k (t m , x m ) r k (t,x) = t as m — > oo 
uniformly with respect to the policies (u m .)j^. It should be mentioned that the other side 
of discontinuous viscosity boundary condition (14b), i.e. V kif (t,x) > V k+ i^(t,x)£ k ^,(x), is an 
immediate result of pointwise boundary condition since the value functions (V k ) k=1 and ^ 
are all lower scmicontinuous. □ 
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